Draft version November 28, 2012 

Preprint typeset using I^'T^]X style emulatcapj v. 12/16/11 



(N 
O 

> 
O 

^: 

^' 

o 



> 
in 



THE JHU-SDSS METAL ABSORPTION LINE CATALOG: 
REDSHIFT EVOLUTION AND PROPERTIES OF Mg II ABSORBERS 

GUANGTUN ZhU^ & BrICE MeNARD^'^'^ 
Draft version November 28, 2012 

ABSTRACT 

We present a generic and fully-automatic method aimed at detecting absorption lines in the spectra 
of astronomical objects. The algorithm estimates the source continuum flux using a dimensionality re- 
duction technique, nonnegative matrix factorization, and then detects and identifies metal absorption 
lines. We apply it to a sample of 10^ quasar spectra from the Sloan Digital Sky Survey and compile 
a sample of ~ 40,000 Mg II & Fe II absorber systems, spanning the redshift range 0.4 < z < 2.3. 
The corresponding catalog is publicly available. We study the statistical properties of these absorber 
systems and find that the rest equivalent width distribution of strong Mg II absorbers follows an ex- 
ponential distribution at all redshifts, confirming previous studies. Combining our results with recent 
near-infrared observations of Mg II absorbers we introduce a new parametrization that fully describes 
the incidence rate of these systems up to z ~ 5. We find the redshift evolution of strong Mg 11 
absorbers to be remarkably similar to the cosmic star formation history over 0.4 < z < 5.5 (the entire 
redshift range covered by observations), suggesting a physical link between these two quantities. 

Subject headings: quasars: absorption lines - galaxies: evolution - galaxies: halos - intergalactic 
medium 



1. INTRODUCTION 

Metal absorption lines detected in the spectra of dis- 
tant sources provide us with a powerful tool to probe 
the gas content in the Universe: their detectability does 
not depend on redshift nor the apparent luminosity of 
the corresponding object. They can for example be used 
to shed light on gas flows around galaxies. From an ob- 
servational point of view the Mg II AA2796, 2803 dou- 
blet is of particular interest: it is the strongest ab- 
sorption feature detectable in the optical at intermedi- 
ate redshift (0.3 < z < 2.5). It allows us to probe 
low-ionization gas present in the circum- and inter- 
galactic media. Numerous Mg II surveys have been 
conducted (e.g., 'Wevman n et al.l 119 79': 'Lanzetta et alj 
1987; Tvtlcr ct al. 1987; Sarg ent et al. 1988; Caulct 198* 
Steidel fc S argent 1992; Churchill et al.l 119991 I2000a[ 
York et al.i.2006: .Nestor et al. 2005 : Prochter et al.ll200a 
Ouider et al.ll20lil) . Thev have shown that weak (Wq < 
0.3 A) and strong (Wo > 0.3 A) Mg II absorbers have dif- 
ferent statistical properties but the nature of the absorb- 
ing gas is sti ll debated (e.g., [B ergeron fc Boiss el 119911: 
Steidel et all [1994; .Norman et al.. .1996: ChurchiU et al l 



tion lines with various levels of complete ness and purity 
(iNestor et al.lllool [York et al.|[2006.: Bouche et al.ll2006i: 



2000bt iBouche et al.l " l2007t iChelouche fc Bowed |201( 
Chen et al .1 I2010allbl: IKacprzak et al .1 120101. l2 0lTallbl: 
Nestor et all 120111 : iMenard et al.l 120111 : Bordol oi et al.l 

201 IL among others), 

The Sloan Digital Sky Survey (SDSS, lYork et al.l[2000h 
has provided us with a sample of more than 100,000 
quasar spectra well suited for the detection of inter- 
vening absorber systems. Previous works have made 
used of certain data releases to detect Mg II absorp- 
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Prochter et all 120061: ILundgren et a'Lll2009l: lOuider et alj 
2011| ). In the era of large sky surveys it is important 
to develop efficient algorithms to automatically detect 
absorption-line systems in quasar spectra to take advan- 
tage of the ever-growing data. In this paper we present 
such an algorithm and apply ing it to the seventh D ata 
Release of the SDSS (DR7, lAbazaiian eFall [20091 ) we 
present the detection of ^ 40,000 Mg II absorbers at 
0.4 < z < 2.3. With this new dataset, we study the ab- 
sorber incidence rate as a function of redshift and rest 
equivalent width. The method presented in this paper is 
generic and can be applied to any large sample of spectra. 

The paper proceeds as follows: in Section [21 we de- 
scribe the dataset and the algorithm. We present the 
catalog of detected Mg II absorbers in Section[3]and their 
statistical properties in Sectional We summarize our re- 
sults in Section [5| 

2. THE ABSORPTION-LINE DETECTION ALGORITHM 

Detecting absorption lines in the spectrum of a source 
requires two essential steps: (i) estimating the con- 
tinuum intrinsic to the source, and (ii) detecting de- 
partures from the continuum estimate. Here we de- 
scribe an algorithm performing those tasks and apply 
it to the SDSS quasar catalog ([Schneider et al.l [20101. 
This catalog includes 105, 783 spectroscopically con- 
firmed qua sars. We use the qu asar redshift estimates 
provided bv lHewett fc Wild! (I 2010fi. Besides the quasars 
in the DR7 catalog. iHewett fc Wild (201Q) also includes 
1411 additional visually- inspected quasars, which we 
treat in the same manner below. 

2.1. Continuum estimation 

* http : //das . sdss . org/va/Hewett_Wild_dr7qso_newz/ 



2 



Zhu & Menard 



2.1.1. NMF eigenspectra 

Studies using princ iple component analysis (PC A, e.g., 
iConnoUv et al.|[l995[ ) have shown that q uasar spectra re- 
side in a low-dini ensional subspace (e.g., lYip et ai]|2004l : 
iWild et al.|[2006l) . The continuum of a given quasar can 
be described by a linear combination of a "small" -size 
basis set of eigenspectra. In order to define such a set, 
we use the technique of nonncgati ve matrix factoriza- 
tion (NMF, iLee fc Seun3[l999: Blanton fc Roweis 200 7). 
Given a set of spectra, NMF defines a basis set of non- 
negative eigenspectra. This approach is motivated by 
the nonnegativity of the components representing an ob- 
served quasar spectrum: continuum, emission lines and 
the flux of the host galaxy. In this work we choose to 
limit the dimensionality of the eigenspectra to twelve. 
We find this value to be sufficient to capture most of 
the variation in the shapes of SDSS quasar spectra. We 
note that this value can be increased or decreased by 
several without significantly changing the results of our 
analysis. Working with iVdim ^ 10 is not sufficient to 
capture all the complexity of quasar spectra and a very 
high number of dimensions sometimes provides enough 
flexibility to include intervening absorption lines in the 
source continuum estimate. 

Using the sample of quasar spectra introduced above 
we construct a basis set of NMF eigenspectra using rest- 
frame flux-normalized spectra. As we access different 
rest-frame wavelength ranges as a function of quasar red- 
shift we cannot apply a uniform normalization. We have 
chosen four wavelength ranges in which quasar spectra 
are relatively featureless and which are enough to charac- 
terize the whole range of quasar redshifts in our sample. 
These regions as well as a description of our normaliza- 
tion scheme are presented in Appendix A. For each of 
these four redshift ranges we create a basis set of eigen- 
spectra using all corresponding quasars. When estimat- 
ing the continuum of a given quasar, we choose the set 
of eigenspectra whose median redshift is closest to the 
quasar's redshift. This guarantees that each quasar spec- 
trum is described by a set of eigenspectra built from the 
maximal number of available quasars covering the same 
wavelengths. 

The presence of strongly dust-reddened quasars, broad 
absorption lines (BALs) or various spectroscopic artifacts 
can affect the eigenspectra estimation. In order to ac- 
count for such outliers, we take an iterative approach. 
After having decomposed the spectra of all quasars into 
eigenvectors we keep only those for which the eigenval- 
ues lie within 5cr of the mean eigenvalues of all input 
quasars. We iterate this process until no outlier is found. 
The code usually converges after < 10 iterations and en- 
sures that the construction of the eigenspectra does not 
include peculiar objects. 

Once the basis set of eigenspectra has been defined in 
each of the four redshift intervals, we estimate the con- 
tinuum of all quasars by finding the best-fit nonnegative 
linear combination of the eigenspectra. We present two 
examples of the NMF fitting in the top panels of Figure 
[Hand [3 

2.1.2. Filtering out intermediate-scale fluctuations 

The NMF continuum estimation captures mostly the 
large-scale fluctuations of a quasar spectrum. As we are 



interested in detecting narrow absorption lines we can 
improve our continuum estimation by removing power on 
intermediate scales. To achieve this, we apply a median 
filter with a size larger than the typical absorption width. 
The instrumental resolution of SDSS spectra is about 
69 km s~^. The wavelength binning of the spectra is in 
logarithmic (velocity) space, with one pixel matching the 
spectral resolution. The two Mg II AA2796, 2803 lines 
are separated by 7.28 A in rest frame, which trans- 
lates to ~ 11 pixels in the observer frame. The full 
width at half maximum (FWHM) of each line, convolved 
with the SDSS instrumental resolution, can reach up to 
500 km s~^ (~ 7 pixels). The overall coverage of a Mg II 
doublet is thus about 18 pixels in the SDSS spectra. We 
first apply an intermediate-scale median filter with a size 
of 141 pixels (about eight times the size of a strong Mg II 
absorber system) then remove smaller scales power by 
applying a filter with a size of 71 pixels. While doing so 
we mask out pixels possibly containing narrow absorp- 
tion lines by only keeping fiuctuations within l.Scr of the 
continuum. We repeat these two steps three times, which 
we found is sufficient for the estimation of the median 
continuum to converge. 

In the middle panels of Figure [T] and [U we show the 
median-filtered NMF residuals. The median filtering 
captures the fluctuations on intermediate scales while 
preserving the narrow absorption lines. In the bottom 
panels we show the final residuals, i.e., the spectra nor- 
malized by the products of the NMF continua and the 
median continua. We also label the prominent narrow 
absorption lines based on the Mg II absorbers we find in 
these two spectra. We describe the line detection method 
in Section [2:21 

2.1.3. Sky emission and galactic absorption 

SDSS spectra contain features due to sky emission lines 
such as the O I A5577 and OH lines that are not properly 
subtracted. This can be seen in Figure [2] These features 
do not substantially affect our estimation of the quasar 
continua which is done in the quasar rest frame. How- 
ever, care is needed when detecting narrow absorption 
lines and estimating their completeness. 

To quantify this effect we stack all the residual spectra 
in the observer frame. The result is shown in Figure [3l It 
shows that the SDSS pipeline on average under-subtracts 
sky emission lines, O I A5577, O I A6300, and OH linef0- 
In addition, the Ca II AA3934, 3969 and Na D absorption 
lines induced by the interstellar medium (ISM) of the 
Milky Way are clearly visible. When searching for narrow 
absorption lines we need to exclude the Ca II region that 
may introduce false positives. When determining the 
completeness of our pipeline, we also need to account for 
all these potential sky residuals. We will come back to 
this in Section 15751 

2.1.4. Outlier rejection 

A fraction of quasar spectra contain BALs which com- 
plicate the continuum estimation. To identify them we 
measure the variance of each quasar fiux residual and 

5 See also lYaiillSoTll . This effect is Icnown to the SDSS reduction 
team and the noise estimate of the flux is enhanced accordingly 
(David Schlegel, private communication). 
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Fig. 1. — Illustration of the steps involved in the absorption-line detection pipeline: Top panel: The black line shows the normalized 
observed spectrum for quasar SDSS J001602.40 — 001225.0. The blue line represents the best-fit NMF continuum and the red line shows 
the final continuum estimate after median filtering. The search window for Mg II absorbers is indicated at the top. The filled green region 
shows the region where an absorber is considered as intervening. Middle panel: the black line shows the NMF residual spectrum, i.e., the 
ratio of the observed spectrum to the NMF continuum estimate. The orange line shows the median continuum estimate. Bottom panel: 
Final residual spectrum used for the absorption line detection. In this case an absorber at z ~ 1.972 is detected from a series of metal 
absorption lines. 

exclude objects for which the value is significantly larger 
than that of the overall population. This procedure re- 
moves 16, 704 objects from our sample. We note that this 
in principle can reject quasars whose spectra host a very 
large number ('^ 10) of strong Mg II absorbers. These 
systems, however, are extremely rare, and should not 
have any practical effects on our survey. Due to catas- 
trophic errors and gaps in the data, a small fraction of the 
quasar spectra (55 objects) present less than 5 valid pix- 
els in the wavelength ranges used for flux normalization. 
Such objects are not included in our analysis. Beyond 
z = 4.7, we have only 219 quasars and cannot build a 
well-defined basis set of eigenspectra. We do not con- 
sider these high-redshift quasars. We also exclude 5682 
quasars with z < 0.4 which cannot be used to look for 
Mg II absorption. This leaves 84, 534 quasars well suited 
for narrow absorption-line detection. 

2.2. Absorption line detection 

Having compiled a set of continuum-normalized quasar 
fluxes we now detect, identify and characterize narrow 
absorption lines. Our procedure includes three steps; (1) 
candidate selection; (2) false positive elimination; and (3) 
equivalent- width measurement. 



2.2.1. Search window 

For a given quasar spectrum the redshift range in which 
we search for absorbers is constrained by several factors: 
the wavelength coverage of the SDSS spectrum, the red- 
shift of the quasar, and the capability of the detection 
method to differentiate between different types of ab- 
sorbers. 

Mg II absorbers with Zabs '~ .^QSO are likely physi- 
cally associated with their background quasar. Associ- 
ated a bsorbers can either be blueshifted or red shifted 
(e.g., iVanden Berk et al.|[200l iShen fc Minardi [20121. 
Although we are primarily interested in intervening Mg II 
absorbers associated with foreground sources, we ex- 
tend the search window redshifted from the quasar by 
Az = 0.04 (12,000 kms^^) to include these quasar- 
associated absorbers. At wavelengths blueward of the 
quasar C IV emission line, the covering fraction of inter- 
vening C IV absorbers is substantially higher than that 
of Mg II absorbers. A doublet found close to or blue- 
ward of the C IV emission line, has a higher probability 
to be C IV than Mg II. In this Mg Il-based survey, we 
thus do not consider the region blueward of the quasar's 
C IV line, leaving the C IV- Mg II discrimination to future 
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SDSS J044129. 02-053646. 8 (z=1.648) 
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Fig. 2. — Same as Figure[T] but for an observation severely affected by bad removal of sky emission lines (SDSS J044129.02 — 053646.8). 
The residual O I A6300 and OH lines in the red are conspicuous in the observed spectrum (top panel). Our continuum fitting is however 
not strongly affected by the presence of these features. 
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Fig. 3. — The composite residual spectrum of all quasars in the observer frame. This shows that the spectral extraction of SDSS on 
average underestimated sky emission lines, e.g., O I A5577, O I A6300, and OH lines. The absorption lines Ca II and Na D are caused 
mainly by interstellar medium in the Milky Way. 
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work. Since C IV absorption lines can also be redshifted, 
we conservatively start the search window redward of 
C IV by Az = 0.02 (6000 km s~^). 

The ISM in the Milky Way can also cause absorp- 
tion lines in the spectra of extragalactic sources (Fig- 
ure |3]). Our experience shows that in some cases the 
Ca II AA3934, 3969 lines from the Milky Way can mimic 
a ~ 0.4 Mg II doublet. To avoid the introduction of 
such false positives, we mask out the Ca II region in our 
search window for Mg II absorbers. 

Our final search window in each quasar spectrum there- 
fore starts from Az = 0.02 redshifted from the quasar's 
C IV emission line or the blue end of the SDSS cover- 
age {r^ 3800 A), and ends at Az = 0.04 redshifted from 
the quasar's Mg II emission line or the red end of the 
SDSS coverage (~ 9200 A), excluding the observer- frame 
Ca II regions. In the top panels in Figure [T] and [21 we 
show the search windows of the given examples. 

2.2.2. First pass: candidate selection 

The first step in the line detection is to select a list 
of absorption line candidates. To do so we use a multi- 
line model including Mg II AA2796, 2803 and four strong 
Fe II lines: A2344, A2383, A2586, and A2600. The in- 
clusion of these Fe II lines facilitates the elimination of 
false positives (see next sub-section). We then perform a 
match filter search for candidates detected above a cer- 
tain signal-to-noise {SNR ) ratio threshold. Within the 
search window of a given quasar, we convolve the residu- 
als and the noise estimates with the multi-line model us- 
ing top-hat filters with a width of 4 pixels (276 km s~^) 
for each line. This is motivated by the typical FWHM 
of the absorption lines which, convolved with the SDSS 
instrumental resolution, is ^ 100 — 400 km s~^ (^2 — 6 
pixels). For each quasar, we perform the convolution at 
every potential absorber redshift, given by all the pix- 
els within the search window. We then select absorber 
candidates at pixels that satisfy Criterion_MgII : 

SNR(Js/lg II A2796) > 4 && SNR{Mg II A2803) > 2. 

(1) 

This criterion determines the window function of our 
search. Finally, for candidates with continuous red- 
shifts/pixels, we group them together and treat them as 
one single candidate with their median redshift. 

2.2.3. Second pass: false positive elimination 

Once we have a list of absorber candidates that passed 
Criterion_MgII, we take the following steps to eliminate 
false positives: (i) We fit each Mg II AA2796, 2803 dou- 
blet candidate with a double-Gaussian profile and reject 
candidates with peculiar separations between two Gaus- 
sians. In the fitting, we assume the dispersions of the 
two Gaussians to be the same but allow their centers 
and amplitudes to be different. We reject a candidate 
if the separation between the two Gaussians differs from 
the fiducial value by 1 A. Experiments show that the ex- 
act value of this criterion has little effect and this method 
efficiently eliminates the majority of false positives, (ii) 
To strengthen the identification of a Mg II absorber we 
make use of the Fe II lines. For each quasar, we compare 
every two remaining candidates and examine if any of 



the Fe II lines from one candidate is at the same wave- 
length as any of the Mg II lines from the other. If so, we 
rank the two candidates by the average SNR of the four 
absorption lines: the Mg II AA2796, 2803, Fe II A2600, 
and A2586. We then keep the one with the higher aver- 
age SNR unless the other one has all the four Fe II lines 
(A2344, A2383, A2586, and A2600) detected above 2ct, 
i.e., unless it satisfies Criterion_FeII : 

SNR{Fe II A2344, A2383, A2586, & A2600) > 2, (2) 

in which case we keep both. Since a false positive caused 
by a line confusion does not have other lines at the right 
wavelengths, it has a lower average SNR of the four lines 
and this method therefore efficiently eliminates the re- 
maining line confusions, (iii) In some rare cases the C III 
emission line of some quasars is not properly modeled 
by our procedure and gives rise to absorption-like fea- 
tures in the residuals. To eliminate these false positives, 
besides CriterionJ4gII we require an additional detec- 
tion of one of the four Fe II lines above 2a if a candidate 
is at redshifts Az = ±0.02 from its host's C III. This 
additional criterion decreases the completeness of Mg II 
absorption-systems with weak Fe II lines in C III regions. 
When evaluating the completeness of our survey (Section 
13. 3p . we will exclude this region. 

2.2.4. Final pass: absorber properties 

Having a list of robust Mg II absorber systems, we now 
determine their redshift and line properties. We estimate 
the rest equivalent width of each available absorption 
line by fitting a Gaussian profile. When two line profiles 
overlap we perform the fit with a double-Gaussian profile 
to prevent biases in the rest equivalent width estimation. 
This procedure also allows us to estimate the redshift 
and rest equivalent widths of each Mg II system more 
precisely than done in the first pass. 

3. THE Mg II ABSORBER SAMPLE 

We ran our line detection pipeline on the 84, 534 (out 
of 107, 194) quasars suitable for narrow absorption line 
detection. Within the search window, we detect 40, 429 
Mg II absorbers. The spatial distribution of the quasars 
and absorbers are shown in the left panel of Figure|3]with 
orange and blue points, respectively. The corresponding 
redshift distributions are shown in the right panel. 

In this section, we will focus on so-called intervening 
absorbers. We conservatively define such systems to be 
blue-shifted from their background quasar by at least 
Az — 0.04, which corresponds to Av ^ 12, 000km s~^. 
This absorber sample has 35, 752 objects. We now char- 
acterize its completeness and purity. 

3.1. Comparison with the Pittsburgh Catalog 

Prior to this work the largest compilation of Mg II 
absorbers is t he Pittsburgh cata log based on the SDSS 
DR4 dataset (iQuider et al.ll2011l!FI . using the detection- 
method presented in 'Nestor "eTalf poof ). To ensure a 
high purity and completeness of the absorber detection 
these authors visually inspected the quasar flux residu- 
als. This sample therefore provides us with a good test 
bed for our pipeline. There are 41,881 common quasars 

^ http : //enki . phyast . pltt . edu/Pitt SDSSMgllcat . php 
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Fig. 4. — Left: pie diagram of the quasars (blue) and Mg II absorbers (orange) in the RA-z space. For clarity we only show quasars with 
a detected absorber. Right: redshift distributions of all 107, 194 quasars and 40,429 Mg II absorbers, spanning redshift from z = 0.36 to 
2 = 2.29. 



searched for Mg II doublets in both surveys. Within the 
search winow, we detected 18, 748 Mg II absorbers with 
^^A2796 > 0.02 A, while the Pittsburgh group detected 
14, 715. Among these 14, 715 detections, we recovered 
14,079 95%). The remaining ~ 5% did not pass our 
Criterion_MgII due to noise or masks. As this effect is 
taken into account in our completeness estimation, these 
missing systems do not bias any statistical analysis. In 
addition, we detected 4669 systems (~ 25%) that are not 
included in the Pittsburgh catalog but are fully consis- 
tent with Mg II absorber^]. In Appendix B, we care- 
fully inspect these systems and demonstrate that they 
are bonafide Mg II absorbers. In Figure [5l we compare 
the rest equivalent width measurements, Wq^"^^^ and 
y^A2803^ for commou absorbers in both catalogs. The 
two rest equivalent width distributions appear to be con- 
sistent and the scatter of the residuals is comparable to 
the typical measurement error. 

3.2. Properties 

The saturation level of the Mg II doublet is a valuable 
indicator. The Wo^2796^|yO\2803 ^^^^^^ jg expected to be 
bounded between 2 (optically thin regime) and 1 (sat- 
urated). We show the doublet ratio distribution of our 
catalog as a function of Wq^''^^^ in the left panel of Figure 
[6l For comparison, we also overplot the expected mini- 
mum and maximum values with horizontal dashed lines. 
The distribution shows that most of the doublets are sat- 
urated, especially at Wq^'^^^ > 1 A. The rest equivalent 
widths of most of these doublets therefore measure the 
kinematics of the ionized gas. The fraction of unsatu- 
rated absorbers increases towards the weaker end. 

In the right panel of Figure [6l we show the measured 
Gaussian velocity dispersion as a function of Wq™^. 
We have removed the SDSS instrumental resolution 
69 km s""'^ by quadrature subtraction. Some systems end 
up with a negative velocity dispersion due to the noise 
level. They are not shown in the figure. The Gaussian 

It is known that some absorbers were missed due to human 
errors and will be included in their future data release (Daniel 
Nestor, private communication) 



velocity dispersion scales nearly linearly with Wq , es- 
pecially at the strong end. This is expected since equiva- 
lent width primarily measures the velocity spread of the 
gas. 

3.3. Completeness 

The detection of an absorber with a given rest equiv- 
alent width, doublet ratio, and redshift depends on the 
accuracy with which the source continuum can be esti- 
mated. We now estimate the detection completeness of 
our algorithm using a Monte Carlo simulation. 

We simulate absorbers drawn from a distribution of 
rest equivalent widths and doublet ratios. For each 
quasar, at each pixel, we insert a fake absorber into the 
flux residuals. We consider this absorber covered by the 
spectrum if relevant pixels are not masked out, and de- 
tected if its final signals pass the Criterion_MgII com- 
pared to the convolved noise model at those pixels. With 
the Monte Carlo simulation, we determine the aver- 
age redshift path, given a redshift bin Az, as the bin 
width multiplied by the fraction of covered absorbers: 
Az — Az/covorod- At a given rest equivalent width and 
redshift, we determine the completeness /(W(^^^®^,z) as 
the ratio of the number of detected absorbers to that 
of covered absorbers, marginalized over all doublet ra- 
tios. We present the completeness f{WQ^'^^^,z) in the 
j^A2796 _ ^ space in the left panel of Figure [71 In the 
right panels, we present the completeness as a function 
of Wq^'^^^ and z averaged over all redshifts and all rest 
equivalent widths in the upper and lower panels, respec- 
tively. The completeness is higher for stronger absorbers 
and at redshifts for which the noise level of the flux resid- 
uals is lower. The conspicuous low completeness spikes in 
the left panel (dips in the bottom right panel) are due to 
prominent sky lines, e.g., O I A5577 and OH lines in the 
red. The broad low completeness bump at 1.0 — 1.2 
is caused by a combination of high-pressure sodium at 

5500 — 6100 A in the sky light and the decreasing sensi- 
tivity during the split of the blue and red spectrographf0. 
Towards both ends, the sensitivity of the SDSS spectro- 

* http : //www . sdss . org/dr7/instruments/spectrographs 
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Fig. 5. — Comparison of the rest equivalent width measurements of Mg II A2796 (left) and Mg II A2803 (right) between the present work 
and the Pittsburgh catalog. The contours enclose 70%, 85%, and 95% of the sample in each panel. The lower panels show the difference 
AW(^Wo,jHU ~ W'o, Pitts- The mean differences (A) and sample dispersions cta are shown in the upper panels for clarity. The measurements 
by the two pipelines agree very well with no systematic shift and a < 0.2 A scatter, which is the typical measurement error. 



graphs drops thus reducing the completeness. We can 
now derive the intrinsic incidence rate of Mg II absorbers 
from the detected absorbers by weighting each absorber 
with W;^2796 redshift z with w = l//(W^o^2796^ 

Figure [Si we show that the observed (black) and intrinsic 
(red) Wq'^'^'^^ incidence distributions. 

4. STATISTICAL PROPERTIES 

The incidence rate j dzdW^'^'^'^^ of Mg II ab- 
sorbers, i.e. the number of systems per unit redshift 
and rest equivalent width, carries important information 
on the number density and crosssection of the absorber 
systems, as a function of redshift. As pointed out by 
iNestor et all (|2005l ). the distribution of Mg II rest equiv- 
alent widths is well described by an exponential distri- 
bution above Wq 0.3 A while weaker absorbers follow a 
power-law distribution. The two populations may be de- 
scribed more generically usi ng a Schechter function (e.g., 
iKacprzak fc ChurchillllMll) . Using our absorber sample 
we now focus on strong Mg II absorbers and study their 
incidence rate d'^N jdzdW^'^^ . 

4.1. Rest equivalent width distribution 

We measure the incidence rate of the Mg II ab- 
sorbers detected above. We estimate it using bins with 
AWo^2796 ^ Q 2 A and Az = 0.15. We start the lowest 
redshift bin at z = 0.43 to avoid the region contami- 
nated by Galactic Ca II absorption and extend the high- 
est redshift bin to z = 2.30 to include highest-redshift 
absorbers. We present the measurements and Poisson 
errors in Figure [3] For clarity, we have shifted the mea- 
surements from high redshift to low redshift by —0.5 
dex. The filled circles represent strong absorbers with 
iyA2796 ^ Q g Aj while open circles indicate weak ab- 
sorbers with 0.2 A < Wo^2796 ^ 0.6 A. The rest equiv- 



alent width distributions are found to follow an expo- 
nential distribution at all redshifts. To summarize the 
overall dependence we perform a least-square fit to all 
d^N/dzdW^^'^^^ data points with 0.6 A < Wo^2796 ^ 5 9 
A using an exponential function form: 



(z,W^„™) 



N*{z) 
W*{z)' 



(3) 



We present the best-fit parameters N* and W* in Table 
[1] and show W* in the inset in Figure [9] We also show 
the best-fit relations as dashed lines in the figure. It is re- 
markable to see that the simple dependence given in Eq.[3] 
is able to describe 240 independent data points. Our fit- 
ting process does not include weak absorbers (shown as 
open circles) which appear to be drawn from a differ- 
ent distribution. The extrapolation of the exponentials 
clearly underestimates the incidence rate of such systems. 
The scale factor of the exponential form, W* , is found 
to have a strong redshift dependence. It increases up to 
z ^ 1.5 and decreases beyond. We now investigate this 
redshift evolution in more detail. 

4.2. The redshift evolution of Mg 11 absorbers 

We define the incidence rate of Mg II absorbers in a 
given range of rest equivalent width as 



dN 
dz 



(Winin <Wq <W„ 



92 iV 
dzdWo 



dWo . (4) 



In Figure [TUl we present this quantity as well as the cu- 
mulative incidence rates above a given rest equivalent 
width, as a function of redshift. The incidence of ab- 
sorbers with W(^2796 ^ 0.6 A increases by less than a 
factor 2 between z = 0.5 and z = 2. In contrast, stronger 
absorbers experience a stronger redshift evolution: from 
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Fig. 6.— Left panel : The Mg II AA2796, 2803 doublet ratio distribution. The contours enclose 50%, 80%, and 95% of the sample. The 
two blue horizontal dashed lines show the two theoretical limits 1 and 2. Most of the doublets are saturated with ratio ~ 1, indicating the 
rest equivalent width primarily measures velocity spread rather than column density. The fraction of unsaturated ones increases towards 
the weaker end. Right panel: Velocity dispersion distribution of the Mg II AA2796, 2803 doublets from the double-Gaussian fitting. The 

/A2796 jj^ both panels the contours enclose 50%, 80%, and 95% of the sample. 



velocity dispersion scales nearly linearly with Wq 
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Fig. 7.— Left: The completeness /(Wq^^''^^ , z) in the W^^'^^^'-z space. The low completeness due to sky lines, e.g., O I A5577, high 
pressure sodium bump at ~ 5900 A, O I A6300, and OH lines in the red, is clearly visible. Top right: Average completeness fiW^^'^''^^) 
over all redshifts. Bottom right: Average completeness /(z) over all rest equivalent widths. 
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Fig. 8. — Distribution of W^'^'^^^ rest equivalent widtiis. Tlie 
black line shows the observed distribution and the red line shows 
the intrinsic distribution after completeness correction. 



TABLE 1 

Best-fit parameters of Equation [3] 



z 




< z > 


N* 


W* 


0.43 - 


0.55 


0.48 


1.11 ±0.04 


0.51 ±0.01 


0.55 - 


0.70 


0.63 


1.06 ±0.03 


0.59 ±0.01 


0.70 - 


0.85 


0.78 


1.13 ±0.03 


0.63 ±0.01 


0.85 - 


1.00 


0.93 


1.25 ±0.03 


0.63 ±0.01 


1.00 - 


1.15 


1.08 


1.21 ±0.03 


0.68 ±0.01 


1.15 - 


1.30 


1.23 


1.22 ±0.03 


0.73 ±0.01 


1.30 - 


1.45 


1.38 


1.34 ±0.03 


0.71 ±0.01 


1.45 - 


1.60 


1.53 


1.33 ±0.04 


0.76 ±0.02 


1.60 - 


1.75 


1.68 


1.32 ±0.05 


0.76 ±0.02 


1.75 - 


1.90 


1.83 


1.65 ±0.06 


0.70 ±0.02 


1.90 - 


2.05 


1.98 


1.49 ±0.08 


0.70 ±0.03 


2.05 - 


2.30 


2.13 


1.55 ±0.10 


0.66 ±0.03 



z = 0.5 to z — 1.5 the incidence rate of absorbers with 
increases by about an order of magnitude. 
Interestingly, their incidence rate then flattens out from 
z=1.5toz^2 and decreases towards higher redshift. 

To characterize the redshift evolution of Mg II ab- 
sorbers over a broader range of reds hift we include re- 
cent in cidence rate measurements bv lMateiek fc Simcod 
(|20T1 ). Using near-infrared data these authors have es- 
timated the incidence rate of Mg II absorbers up to 
z = 5.5. The combined results are presented in Fig- 
ure [TT] Over the entire redshift range 0.4 < z < 5.5, i.e., 
about 60% of the age of the Universe, the incidence rate 
of weaker absorbers (with 0.6 A < Wq^'^^'' < 1.0 A) is 
roughly the same, within a factor of 2. In contrast, the 
incidence rate for stronger absorbers (with W(^2796 > ^.O 
A) increases up to z ~ 2 and then decreases up to ^; ~ 5. 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 



3.05<z<2.30 
1.90<z<2.05 




Fig. 9.— Differential incidence rate d'^ N/dzdW^^"^^^ of Mg II ab- 
sorbers. For clarity, we have shifted the measurements by —0.5 dex 
from high to low redshift. The dashed lines represent best-fit expo- 
nential functions (Eq. [Sj in the range 0.6 A < Wq'^'^^'' < 5 A, i.e. 
including all the filled circles. Weaker absorbers follow a different 
distribution. The inset shows the redshift evolution of the W* pa- 
rameter and the red solid line shows the best-fit paramctrization 
(Eq.m. 

For those systems, the global redshift evolution of the 
incidence rate appears to be very si milar to the cosmic 
star formation history (SFH, e.g., iHopkins fc Beacoml 
2006; Zhu et al. 2009). This is illustrated by the dashed 
line in the lower pa nel of Figure [TT] which show s the best- 
fit cosmic SFH bv IHopkins fc Beacoml ()2006l ). The am- 
plitude of this curve has been scaled to match the am- 
plitude of the Mg II incidence rate at z ^ 1.5. The 
overall shapes of these two quantities are strikingly sim- 
ilar, pointing to a direct connection between strong ab- 
sorbers and star formation. To quantify this further, 
we introduce a new paramctrization of the incidence 
rate of Mg II absorbers combining constraints from our 
SPS S-based results and high er-redshift measurements 
from lMateiek fc Simcod (|2012f ). We choose a functional 
form inspired by the commonly-used one for the cosmic 
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2.4<W„<3.0 A 



, 3.0<W„<3.6 A 




, 3.6<W„<4.2 A 



Fig. 10. — Cumulative (left) and differential (right) incidence rates dN/dz of Mg II absorbers. The solid lines show the best-fit 
parametrization (Eq.|5ll. The redshift evolution is much stronger for stronger absorbers. 



SFH (e.g., ICole et al.ll200ll: IHopkins fc Beacomll2006l) : 



92 iV 



A2796 
' 



z) = g{z) e 



(5) 



TABLE 2 

Best-fit parameters of equation 5 



where 



and 



9{z) = 90 



(1 



l + (fF« 



90 


"9 


^9 




0.63 ± 0.39 


5.38 ± 1.08 


0.41 ± 0.06 


2.97 ±0.59 


Wo 


aw 


zw 




0.33 ± 0.03 


1.21 ± 0.19 


2.24 ±0.28 


2.43 ±0.25 



W*{z) = Wo 



(1 + ^)" 



in which ag, aw, Pg, l3w, Zg, and zw > 0. We perform a 
global least-squares fit to all d'^N/dzdW^^'^^^ measure- 
ments at 0.6 A < Wo^2796 ^ 5 A at ah redshifts with the 
parametrization above. We note that given their large 
error bars, the near-infrared high-redshift measurements 
contribute only weakly to the fit. The constraints are 
dominated by the more precise measurements presented 
in this work. The best-fit parameters and their formal 
errors are given in Table [2] and the corresponding inci- 
dence rates are shown with solid lines in Figures [TU] and 
111! In both cases the parametrization given in Eq.[5]is an 
accurate representation of the data points over the en- 
tire redshift range. In addition we accurately reproduce 
the redshift evolution of W*{z), introduced in Eq. |3l as 
shown in the inset of Figure |9l 

4.3. Discussion 

Using about 35,000 intervening Mg II abs orbers from 
the SDSS and near-infrared data from .Mateiek fc Simcod 
dloia) we have shown that the evolution of the in- 
cidence rate of strong Mg II absorbers is very simi- 
lar to that of the cosmic SFH over the entire range 
0.4 < z < 5.5. This is in a greement with previous re- 
sults (e.g., [Bergeron fc Boisse.,1991; .Nestor et al...2005i : 



iProchter et an[2006t ) but now shown with a much higher 
precision. 

Several studies have suggested a connection be- 
tween strong Mg II ab sorbers and star formation. 
[Bergeron &: Boissg (|1991[ ) showed that most galaxies 
identified with Mg II absorbers in their sampl e are f airly 
blue and show [O II] emission. iNorman et al.l (|1996l ) de- 
tected strong Mg II absorption arising from gas around 
starburst galaxy N GC 520. Using: near - infrared integral 
field spectroscopy, iBouche et al.l |2007[ ) detected strong 
Ha emission around 14 out of 21 strong Mg II absorbers 
with Wo^2796 > 2 A. iNestor et all (|2011| ) studied galax- 
ies around two strong Mg II absorbers with Wq^''^^^ > 3 
A, and found that they are both associated with bright 
emission-line galaxies with large specific s tar formation 
rate for their masses. iMenard et al.l (|2011[ ) showed that 
the mean [O II] luminosity density traced by a sample of 
about 8, 500 Mg II absorbers from the SDSS follows that 
of the cosmic SFH. A number of studies of galaxy spec- 
tra also support the connection between absorbers and 
star formation. iTremonti et al.l (|2007f ) detected Mg II 
outfiows in 10 out of 14 post-starburst galaxies. Us- 
ing ga laxy spectra from the DEEP 2 survev HWeiner et al.l 
(|2009f ) showed that blue-shifted Mg II absorption is ubiq- 
uitous in star- forming galaxies at z ~ 1.4, and the Mg II 
equivalent width and outflow velocity increase with stel- 
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Fig. 11.— Incidence rate dN/dz of Mg II absorbers with 0.6 A < Wq'^''^^ < 1.0 A {top) and Wq^'^^^ > 1.0 A (bottom). The blue points 
are from the present study based on SDSS data are our measurements at moderate redshifts, while the green points show near-infrared 
measurements from lMateiek fc Simco i | |20T^ . The blue solid lines show the global fit of Eq.[5]to all d'^ N/dzdW^^'^^'^ data points. In the 
lower panel, the dashed line shows the best- fit cosmic star formation history bv lHopkins &: Beacoml 1)20061 ). scaled to match dN/dz A at 
z ~ 1.5. 

dataset. Our results are summarized as follows: 

• We detected 40,429 Mg II absorbers, with 35, 752 
intervening systems, defined as Zabs < ^^qso — 0.04, 
corresponding to a Aw 12,000 km s^^. This 
doubles the size of previously published Mg II cat- 

alogs. Th e dataset is available at] 

http : //www . pha. jhu . edu/~gz323/jhusdss' 
Future updates including new data releases can be 
found at the same address. 

• We determined the completeness and purity of 
our line detection algorithm and validated it with 
the visually-inspec ted Pittsburgh Mg II catalog 
(|Quider et al.llmii . based on the SDSS DR4 sub- 
set). 

• We measured the differential incidence rate 
a27V/9z9Wo^2796 ]y[g II absorbers: the rest 

equivalent width distribution of systems with Wq > 
0.6 A is well-represented by an exponential at all 
redshifts. The shape of this distribution changes 
for weaker absorbers. Combining our SDSS-based 
results and n ear-infrared measurement s of Mg II 
absorbers by iMateiek fc Simcoi (|2012[ ) we intro- 
duced a new parametrization of the differential in- 
cidence rate d'^N/dzdW^'^'^'^^ of Mg II absorbers 



lar mass and star formation rate. I Rubin et al.l (|20100 
extended the analysis to lower redshift at 0.7 < z < 1.5 
and reached a similar conclusion. More recently, using 
stacke d spectra of background galaxies, IBordoloi et al.l 
(|2011[) studied the radial and azimuthal distribution of 
Mg II gas of galaxies at0.5<z<0.9. They showed that 
blue galaxies have a significantly larger average Mg II 
equivalent width at close galactocentric radii than red 
galaxies. They also showed that the average Mg II equiv- 
alent width is larger at larger azimuthal angle, indicating 
the presence of a strongly bipolar outflow aligned with 
the disk rotation axis. Our results further support the 
connection between strong Mg II absorbers and star for- 
mation, across a wide range of redshifts. 

5. SUMMARY 

The Mg II AA2796, 2803 absorption line doublet probes 
low-ionization and neutral gas in the Universe. We have 
developed a generic and fully-automatic algorithm to 
detect absorption lines in the spectra of astronomical 
sources. The estimation of the fiux continuum is based 
on nonnegative matrix factorization (NMF) , a vector de- 
composition technique similar to principal component 
analysis (PCA) but with the additional requirement of 
nonnegativity. We then applied this algorithm to a sam- 
ple of about 100,000 quasar spectra from the SDSS DR7 
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(Eq. [5]), valid over the entire redshift range 0.4 < 
z < 5.5. 



(IComparat et al.l[20l2l). Bi gBOSS (ISchleeel et al.l[2009l) . 
and PFS (lEUis et al.ll20ll) . 



• Over this entire redshift range, which covers more 
than 60% of the age of the Universe, the incidence 
rate evolution of strong absorbers (with Wq > 1 A) 
is strikingly similar to the cosmic star formation 
history, suggesting a direct link between these two 
quantities. 

The algorithm presented in this work is generic and 
can easily be used in other contexts. It is not lim- 
ited to quasars but can estimate the continuum flux of 
any ensemble of sources, for example galaxies. It can 
also be used to detect any other line, in absorption or 
in emission and at any rest-frame wavelength. It is 
readily applicable to upcoming surveys such as eBOSS 



We thank Alex Szalay, Tamas Budavari, and Ani 
Thakar for sharing their computational resources. We 
have made extensive use of SDSS IDL libraries written by 
David Schlegel, Michael Blanton, David Hogg, and oth- 
ers. We also acknowledge the usage of the MPFIT pack- 
age written by Craig Markwardt. The authors acknowl- 
edge funding support from NSF grant AST-1109665 and 
the Alfred P. Sloan foundation. Funding for the SDSS 
and SDSS-II has been provided by the Alfred P. Sloan 
Foundation, the Participating Institutions, the National 
Science Foundation, the U.S. Department of Energy, 
the National Aeronautics and Space Administration, the 
Japanese Monbukagakusho, the Max Planck Society, and 
the Higher Education Funding Council for England. The 
SDSS Web Site is http://www.sdss.org/. 



REFERENCES 



Abazajian, K. N., Adelman-McCarthy, J. K., Agiieros, M. A., 

et al. 2009, ApJS, 182, 543 
Bergeron, J., & Boisse, P. 1991, A&A, 243, 344 
Blanton, M. R., & Roweis, S. 2007, AJ, 133, 734 
Bordoloi, R., Lilly, S. J., Knobel, C, et al. 2011, ApJ, 743, 10 
Bouche, N., Murphy, M. T., Peroux, C, Csabai, I., & Wild, V. 

2006, MNRAS, 371, 495 
Bouche, N., Murphy, M. T., Peroux, C, et al. 2007, ApJ, 669, L5 
Caulct, A. 1989, ApJ, 340, 90 

Chelouche, D., & Bowen, D. V. 2010, ApJ, 722, 1821 
Chen, H.-W., Helsby, J. E., Gauthier, J.-R., et al. 2010a, ApJ, 
714, 1521 

Chen, H.-W., Wild, V., Tinker, J. L., et al. 2010b, ApJ, 724, L176 
ChurchiU, C. W., Mellon, R. R., Charlton, J. C, et al. 2000a, 

ApJS, 130, 91 
— . 2000b, ApJ, 543, 577 

Churchill, C. W., Rigby, J. R., Charlton, J. C, & Vogt, S. S. 

1999, ApJS, 120, 51 
Cole, S., Norberg, P., Baugh, C. M., et al. 2001, MNRAS, 326, 255 
Comparat, J., Kneib, J.-P., Escoffier, S., et al. 2012, MNRAS, 104 
Connolly, A. J., Szalay, A. S., Bershady, M. A., Kinney, A. L., & 

Calzetti, D. 1995, AJ, 110, 1071 
Ellis, R., Takada, M., Aihara, H., et al. 2012, ArXiv c-prints 
Hewett, P. C, & Wild, V. 2010, MNRAS, 405, 2302 
Hopkins, A. M., & Beacom, J. F. 2006, ApJ, 651, 142 
Kacprzak, G. G., & Churchill, C. W. 2011, ApJ, 743, L34 
Kacprzak, G. G., Churchill, C. W., Barton, E. J., & Cooke, J. 

2011a, ApJ, 733, 105 
Kacprzak, G. G., Churchill, C. W., Ceverino, D., et al. 2010, ApJ, 

711, 533 

Kacprzak, G. G., Churchill, C. W., Evans, J. L., Murphy, M. T., 

&c Steidel, C. C. 2011b, MNRAS, 416, 3118 
Lanzetta, K. M., Turnshek, D. A., &; Wolfe, A. M. 1987, ApJ, 

322, 739 

Lee, D. D., & Seung, H. S. 1999, Nature, 401, 788 
Lundgren, B. F., Brunner, R. J., York, D. G., et al. 2009, ApJ, 
698, 819 

Matejek, M. S., & Simcoe, R. A. 2012, ArXiv e-prints 
Menard, B., Wild, V., Nestor, D., et al. 2011, MNRAS, 417, 801 
Nestor, D. B., Johnson, B. D., Wild, V., et al. 2011, MNRAS, 
412, 1559 

Nestor, D. B., Turnshek, D. A., & Rao, S. M. 2005, ApJ, 628, 637 



Norman, C. A., Bowen, D. V., Heckman, T., Blades, C, & Danly, 

L. 1996, ApJ, 472, 73 
Prochter, G. E., Prochaska, J. X., & Buries, S. M. 2006, ApJ, 

639, 766 

Quider, A. M., Nestor, D. B., Turnshek, D. A., et al. 2011, AJ, 
141, 137 

Rubin, K. H. R., Prochaska, J. X., Koo, D. C, Phillips, A. C, & 

Weiner, B. J. 2010, ApJ, 712, 574 
Sargent, W. L. W., Steidel, C. C, & Boksenberg, A. 1988, ApJ, 

334, 22 

Schlegel, D., White, M., & Eisenstein, D. 2009, in ArXiv 
Astrophysics e-prints. Vol. 2010, astro2010: The Astronomy 
and Astrophysics Decadal Survey, 314 

Schneider, D. P., Richards, G. T., Hall, P. B., et al. 2010, AJ, 
139, 2360 

Shen, Y., & Menard, B. 2012, ApJ, 748, 131 
Steidel, C. C, Dickinson, M., & Persson, S. E. 1994, ApJ, 437, 
L75 

Steidel, C. C, & Sargent, W. L. W. 1992, ApJS, 80, 1 
Tremonti, C. A., Moustakas, J., & Diamond-Stanic, A. M. 2007, 
ApJ, 663, L77 

Tytler, D., Boksenberg, A., Sargent, W. L. W., Young, P., & 

Kunth, D. 1987, ApJS, 64, 667 
Vanden Berk, D., Khare, P., York, D. G., et al. 2008, ApJ, 679, 

239 

Vanden Berk, D. E., Richards, G. T., Bauer, A., et al. 2001, AJ, 
122, 549 

Weiner, B. J., Coil, A. L., Prochaska, J. X., et al. 2009, ApJ, 692, 
187 

Weymann, R. J., Williams, R. E., Peterson, B. M., & Turnshek, 

D. A. 1979, ApJ, 234, 33 
Wild, v., Hewett, P. C, & Pettini, M. 2006, MNRAS, 367, 211 
Yan, R. 2011, AJ, 142, 153 

Yip, C. W., Connolly, A. J., Vanden Berk, D. E., et al. 2004, AJ, 
128, 2603 

York, D. G., Adelman, J., Anderson, Jr., J. E., et al. 2000, AJ, 
120, 1579 

York, D. G., Khare, P., Vanden Berk, D., et al. 2006, MNRAS, 
367, 945 

Zhu, G., Moustakas, J., & Blanton, M. R. 2009, ApJ, 701, 86 



APPENDIX 

A. CONSTRUCTION OF THE NMF BASIS SETS 

In order to create a basis set of eigenspectra we choose to use flux-normalized quasar spectra. To do so we use 
four different wavelength ranges where quasar spe ctra are relatively featurel ess. This choice is based on the median 
quasar spectral energy distribution (SED) given in iVanden Berk et al.l (|2001t ) and summarized in Table [ATI For each 
normalization wavelength range, we normalize the observed spectra with the mean flux within the range. We create a 
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TABLE Al 

Normalization schemes used in the NMF fitting 



Normalization wavelengtli range^ 


Eigenspectra construction redshift range*^ 


Continuum fitting redshift rangc'^ 


4150 A - 4250 A 


z < 1.0 


z < 0.6 


3020 A - 3100 A 


0.4 < z < 1.8 


0.6 < z < 1.0 


2150 A - 2250 A 


0.8 < z < 2.8 


1.0 < z < 2.5 


1420 A - 1500 A 


2.0 < z < 4.8 


2.5 < z < 4.7 



^ When constructing the basis set of eigenspectra, we choose to work on flux-normalized quasar spectra. We choose these 
normalization wavelength ranges where quasar spectra arc relatively featureless. 

^ The redshift ranges where the normalization wavelength range is covered by SDSS. we make use of all available quasars 
within each range to construct the basis set of eigenspectra. 

^ For quasars within these redshift ranges, we fit their observed spectra with the basis set of eigenspectra constructed with 
quasars in the second column. These ranges are chosen so that each quasar is fit with the basis set of eigenspectra constructed 
using maximal number of quasars that cover the same wavelengths. 

basis set for each range using all quasars with the range fuhy covered in the spectra. When fitting the continuum of 
a given quasar, we choose the basis set of eigenspectra whose median redshift is closest to the quasar redshift. This 
guarantees that each quasar is fit with a set of eigenspectra that are built from the maximal number of available 
quasars covering the same wavelengths. 

As an example, we show the NMF basis set of 12 eigenspectra in the redshift bin with 0.4 < z < 1.8, for the 
normalization wavelength range 3020 — 3100 A. In the first five panels, we also label the prominent features such 
as permitted metal emission lines, forbidden lines, and Balmer series. The natural separation of different types of 
emission lines illustrates the power of the NMF vector decomposition to characterize quasar spectra. 

B. COMPARISON WITH THE PITTSBURGH CATALOG 

The Pittsburgh catalog (jQuider et alJ 120111 : iNestor et al.l 120051 ) which uses visual inspection provides a valuable 
reference to validate our pipeline. Here we present a detailed comparison between the two Mg II absorber catalogs. 
There are 41, 881 common quasars searched for Mg II doublets in both surveys. Within the search window, we detected 
18,748 Mg II absorbers with Wg^^rge > o.02 A, while the Pittsburgh pipeline led to 14,715 objects. In Figure [BTI 
we show the fraction of absorbers detected in both surveys as a function of Wq^"^^^. In the left panel, we show that 
among the 14,715 detected by the Pittsburgh pipeline, we recovered 14,079 95%). For the remaining ~ 5% the 
noise level of the residuals at the location of the absorbers is too high for our Criterion_MgII (Eq. [1]) to be satisfied. 
Such regions of the spectra are not included in our redshift path and these non-detections do not affect our statistical 
analysis. If we include absorber systems that do not pass Criterion_MgII but satisfy Criterion_FeII (Eq. [2]), we 
recover close to 100 % of the strong absorbers detected by the Pittsburgh pipeline, as shown by the red line in the left 
panel of Figure IBll 

In the right panel of the figure, we show the fraction of absorbers of our catalog that are also detected by the 
Pittsburgh pipeline. It shows that 4669 of the systems we detected are not reported in the Pittsburgh catalog. The 
fraction of missing systems is a function of rest equivalent width and increases for weaker systems. We now demonstrate 
that these absorbers are bonafide Mg II absorbers. For the strongest systems {Wq^"^^^ > 4 A) we visually inspected the 
spectra and, based o n th e presence of additional metal lines, we were able to confirm the nature of the systems. This is 
illustrated in FigurelH where we have labeled six strongest lines: Mg II AA2796, 2803, Fe II A2600, A2586, A2383, and 
A2344. Nearly all Mg II absorbers can be confirmed with the Fe II lines at the right locations. For weaker absorbers, we 
construct composite spectra. We divide the sample of absorbers missing in the Pittsburgh catalog into four subsamples 

with 0.2 A < 11^5^2796 ^ 0.8 A, 0.8 A < Wo^2796 < I I I I I < Vl^A2796 5 ^^^-^^ 5 ^ < VP^A2796 < 4 9 A. The 

numbe rs o f absorbers in each subsample are 2971, 831, 469, and 379, respectively. The composite spectra shown in 
Figure [B2] display all the expected metal absorption lines, which shows that the absorbers detected by our pipeline but 
not reported in the Pittsburgh catalog are real Mg II absorption- line systems. This confirms that our pipeline leads 
to robust detections of Mg II absorbers. 
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Fig. A1. — The NMF basis set of eigenspectra at 0.4 < z < 1.8. Wc label the permitted metal emission lines, forbidden lines, and Balmer 
series in the first five panels. 
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Fig. B1. — Comparison of the JHU and Pittsburgh catalogs. The black line in the left panel shows the fraction of Mg II absorbers 
reported by the Pittsburgh pipeline that are also detected by our pipeline. To guide the eye, we overplot two horizontal dotted lines at 
100% and 90%. Within the search window, we recovered 14, 079 (~ 95%) of 14, 715 absorbers reported in the Pittsburgh catalog. The 
missing absorbers are due to low SNR or masked pixels and did not pass Criterioii_MgII. If we also include Fe II absorbers that satisfy 
Criterioii_FeII, we recovered close to 100% of strong absorbers. In the right panel, we show the fraction of Mg II absorbers in the JHU 
catalog that are also include d in the Pitt sburgh catalog. We detected 18,748 in total, with 4669 (~ 25%) systems not reported in the 
Pittsburgh catalog. In Figure lB3landlB2l we show that these extra absorbers we found are bonafide absorbers. 



Composite spectra of absorbers in JHU but not in Pittsburgh catalog 
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Fig. B2. — Composite spectra of absorbers detected by the JHU pipeline but not by the Pittsburgh pipeline. We divided the sample 
of absorbers with Wg^^Tge ^ 4 q ^ -^^^^ f^^, gubsamples with 0.2 A < VKq^^™^ < 0.8 A (the number of absorbers N = 2971), 0.8 
A < W(,^2796 < ]^ J A (N = 831), 1.1 A < Wp^2796 < ;^ 5 A (TV = 469), and 1.5 A < W"o^2796 ^ 4 9 A (Af = 379). We have labeled the 
locations of prominent absorption lines to guide the eye. The expected absorption lines snow these systems are real absorbers. 
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Fig. B3. — Individual spectra of absorbers with Wq^'^''^^ > 4.0 A detected by the JHU pipeline but not by the Pittsburgh pipeline. In 
the left panels we show the observed quasar spectra in the observer frame, while in the right panels we show the final residuals in the 
absorber frame. We show the lAU-formatted names of the quasars in the left panels. We also label the locations of Mg II AA2796, 2803 and 
Fe II A2600, A2586, A2383, A2344 lines. To guide the eye, we overplot a gray dashed line at unity in the right panels. We show nearly all 
absorbers are real absorbers that can be confirmed with Fe II lines. 



